els are shown in Table 3.12.
actor Xa protease cleavage data classification
rtant feature of the decision tree algorithms is that they can handle
erical data. For instance, both CART and C5.0 can be applied to
r Xa protease cleavage data set directly without an encoding
Yang, et al., 2006], which was composed of cleaved and non-
peptides. Each peptide is a string of the amino acids, which are
erical. The structure of the factor Xa protease sub-sequences was
ܴଵܴଵ
ᇱ, where cleavage happens between ܴଵ and ܴଵ
ᇱ. These five
were labelled as P1, P2, P3, P4 and P5 in data. Figure 3.45(a)
e CART tree model for this data set and Figure 3.45(b) shows the
model for this data set.
(a) (b)
a) The CART tree model and (b) the C5.0 tree model constructed for the factor
e cleavage data.
e 3.46(a) shows the ROC curves as well as AUC for the CART
models constructed for the factor Xa protease cleavage data,
UC values were 0.856 and 0.916 for the C50 and CART models,
ely. Figure 3.46(b) shows a sequence logo generated by the
ogo package [Wagih, 2017]. Comparing the upper panel and the
nel of Figure 3.46(b), it can be seen why P1 (ܴସ) and P5 (ܴଵ
ᇱ) were
as the most discriminative variables. This is because the amino
position trends at these two residues demonstrated the greatest